Thirty Days of Metal — Day 24: Transparency

Warren Moore
7 min readMay 5, 2022

This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.

If you want to work through this series in order, start here. To download the sample code for this article, go here.

So far, we have only rendered opaque surfaces, objects that do not transmit any light through them. But many substances in the real world, from glass to water to gemstones, allow some light to pass through them, often tinting the light via selective absorption. We call such substances transparent.

You might be used to the term transparent referring to completely “clear” objects, while objects that allow some light through are called “translucent.” In this text, we will reserve the word “translucent” for substances that scatter light while transmitting it, and “transparent” will only be applied to substances that allow light through without seeming to scatter it. This latter property is sometimes called “optical transparency.”

As with all techniques in real-time graphics, we will use approximations to model light and materials in our treatment of transparency. Most particularly, in this article, all transparent surfaces will be treated as if they are infinitely thin. This implies that they will not exhibit scattering, nor selective absorption during transmission.

We will ultimately implement transparency in this article with a technique called alpha blending, which allows us to combine colors that are already in the frame buffer with partially transparent fragment colors, creating the illusion of transparent objects.

Alpha blending can be understood in the context of the broader subject of compositing. First, we will look more closely at the alpha channel, then learn from the computer graphics literature how to use it.

The Alpha Channel

Many images and textures include a channel called the alpha channel, which we have mostly ignored up until this point. Alongside the red, green, and blue components of a color, the alpha component represents the color’s opacity. By opacity, we mean the degree to which a surface prevents light from being transmitted. We call the inverse of this property the transparency, the degree to which a surface allows transmission.

There is another possible interpretation of the alpha channel. We can also think of alpha as representing coverage, the degree to which a surface “exists” at a given point. Imagine the mesh on a screen door: from a certain distance, such a surface restricts the transmission of light in almost the same way as tinted glass. The mesh selectively transmits light by blocking it entirely where the mesh is present, or allowing it to pass entirely where the mesh is absent. The aggregate effect is that the light is dimmed as if the door were partially transparent.

It turns out that both of these perspectives (the “opacity” perspective and the “coverage” perspective) are valid and useful at different times. In the remainder of this article, we will mostly use the opacity perspective.

A Theory of Composition

Compositing is the practice of combining colors or elements of two or more images. For example, when a weatherperson stands in front of a green screen so an animated map can be rendered behind them. By using the green portions of the camera frame as a mask, the compositing system selectively draws the map “behind” the presenter.

One early work on compositing in the context of computer graphics is Porter and Duff’s paper “Compositing Digital Images” (1984). The Porter–Duff composition model provides a mathematical framework for combining partially transparent elements together to achieve a variety of effects. The various ways of combining elements are called “operators.”

Each operator acts on two colors, called “A” and “B”, to produce a composited color. In this article we will only consider one of Porter and Duff’s operators: the A over B operator. This operator has the effect of compositing A in front of B, making A the foreground and B the background.

When we implement alpha blending below, our background will be the existing contents of the frame buffer, and the foreground will be the current fragment. When the alpha component of the fragment is 0 (fully transparent), the background remains as-is. When the alpha component is 1 (opaque), the fragment completely replaces the background. Alpha values between 0 and 1 cause the colors to be blended together according to the foreground opacity.

Premultiplication

We say that an image is premultiplied when the RGB components of its pixels have been multiplied by the alpha component. For example, a premultiplied pixel that is semi-transparent red would have normalized components (0.5, 0, 0, 0.5). A nonpremultiplied semi-transparent red pixel would have components (1, 0, 0, 0.5).

Textures and render targets should contain premultiplied colors. The reason has to do with interpolation. Suppose you have two neighboring texels, one opaque red (1, 0, 0, 1) and one transparent black (0, 0, 0, 0). If the GPU samples the texture halfway between these two texels, linear interpolation produces the value (0.5, 0, 0, 0.5). If we treat this color as premultiplied, it represents a semi-transparent red. On the other hand, if we treat it as nonpremultiplied, we would weight it by its alpha during compositing, which would result in the final color being abnormally dark.

Alpha Blending

Now that we know about compositing and premultiplied alpha in the abstract, let’s talk about how transparency is implemented by GPUs.

Alpha blending is the process of combining together the contents of the render buffer with the color returned by the fragment function. In most APIs, this blending is a fixed-function operation, meaning we configure it rather than writing a shader to do it.

Rather than using the foreground/background terminology from compositing, we will refer to the fragment color as the source and the render target color as the destination. This emphasizes that the fragment is being composited into the frame buffer by the alpha blending process.

Configuring a GPU for alpha blending consists of choosing blending factors and operations. Factors determine the respective weights of the inputs, while operations determine how the weighted contributions are combined. We might write the fully general blending operation as a set of two equations, one that determines the composited RGB components, and one that determines the composited alpha component.

This is horribly abstract, though. As an example, let’s consider how to implement Porter and Duff’s A-over-B operator, which we might call “source-over-destination,” or just “source-over” blending.

Our colors are assumed to be premultiplied, so we will not multiply the source by its alpha component — the source factor is simply one. Because we want the foreground to cover the background proportional to the alpha component of the fragment, the destination factor is one minus the source’s alpha. We will use the same factors for the RGB and alpha channels and combine them with addition.

Our simplified equations for source-over blending are these:

This looks just like ordinary linear interpolation, except that the source factor is one rather than the source alpha. This difference is accounted for by premultiplication.

Alpha Blending in Metal

In Metal, alpha blending is achieved by configuring the render pipeline state for each color attachment.

We first enable blending for an attachment by setting its isBlendingEnabled property:

renderPipelineDescriptor.colorAttachments[0]
.isBlendingEnabled = true

We then set the factors and operations for the source and destination terms of the RGB and alpha channels respectively. Here is how we configure premultiplied source-over blending:

renderPipelineDescriptor.colorAttachments[0]
.sourceRGBBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0]
.destinationRGBBlendFactor = .oneMinusSourceAlpha
renderPipelineDescriptor.colorAttachments[0]
.rgbBlendOperation = .add

renderPipelineDescriptor.colorAttachments[0]
.sourceAlphaBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0]
.destinationAlphaBlendFactor = .oneMinusSourceAlpha
renderPipelineDescriptor.colorAttachments[0]
.alphaBlendOperation = .add

You may recall that we have already been returning premultiplied colors from our fragment shaders, in anticipation of using alpha blending:

return float4(litColor * baseColor.a, baseColor.a);

Enabling blending on the render pipeline state and returning transparent colors from the fragment function are sufficient to achieve alpha blended results.

Our sample scene consists of a model of Earth in space. The opaque planetary surface is surrounded by a transparent layer of clouds, and a faint, transparent glow wraps the planet to emphasize the atmosphere’s depth.

We load the planet scene from three separate OBJ files so we can move the parts independently. Each frame, we rotate the earth and cloud spheres to suggest the passage of time, while also orienting the plane containing the atmospheric glow so it always faces the camera.

Running the sample app gives us a view of home from a distance:

Order Dependency in Blending

Not all is right in the world, however. There is a dirty little secret in transparency that makes getting correct blended results much harder. The inconvenient truth is that order matters when compositing. If we combine multiple layers of transparent materials together in an arbitrary order, we will not get a correct result.

Mathematically, we say that the blending equations are not commutative: in general, operation ffollowed by operation g is not the same as operation g followed by operation f. This implies that we need to enforce an order for our transparent surfaces. In most cases, we want to render surfaces from far to near (i.e., from back to front).

Sorting surfaces from back to front works in many cases. However, sorting transparent surfaces suffers from the same issues we saw when contemplating the painter’s algorithm and the z-buffer: it is not always possible to sort triangles unambiguously by depth. Even when it is possible, sorting a dynamic scene every frame can be prohibitively expensive.

There have been many attempts over the years to implement “order-independent transparency” (OIT), but every possible approach has downsides. We handily avoided ordering issues in our sample scene by manually determining the order of the transparent elements, but this is not always possible in dynamic scenes. If you are interested in various mitigations for order dependency, I recommend consulting the bibliography of McGuire and Bavoil’s 2013 paper.

Next time, we will consider how to add higher fidelity to our transparent objects with environment-mapped reflection and refraction.

Warren Moore

Real-time graphics engineer based in San Francisco, CA.